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ABSTRACT 



Measuring time delays between the multiple images of gravitationally lensed quasars is now recognized as a competitive way to 
constrain the cosmological parameters, well complementary with other cosmological probes. This requires long and well sampled 
optical light curves of numerous lensed quasars, e.g., as obtained by the COSMOGRAIL collaboration. High-quality data from our 
monitoring campaign calls for novel numerical techniques to robustly measure the delays as well as the associated random and 
systematic uncertainties, even in presence of microlensing variations. We propose three different point estimators to measure time 
delays, that are explicitly designed to handle light curves with extrinsic variability. These methods share a common formalism, which 
enables them to process data from «-image lenses. As the estimators rely on significantly contrasting ideas, we expect them to be 
sensitive to different bias sources. For each method and data set, we empirically estimate both the precision and accuracy (bias) of 
the time delay measurement using simulated light curves with known time delays that closely mimic the observations. Eventually, we 
test the self-consistency of our approach, and we demonstrate that our bias estimation is serviceable. These new methods, including 
the empirical uncertainty estimator, will represent the standard benchmark for the analysis of the COSMOGRAIL light curves. 

Key words. Methods: data analysis - Gravitational lensing: strong - cosmological parameters 



1. Introduction 

In the era of precision cosmology, in which a concordance model 
seems to fit independant observations, it is of utmost importance 
to both confront and combine all possible methods that constrain 
cosmological parameters. Confronting them yields an invaluable 
cross-check of the methods and the model. Combining them al- 
lows to break the degeneracies inherent to single techniques. 
Probes including baryonic acoustic oscillations, weak lensing, 
supernovae, and cosmic microwave background measurements 
fit exactly in this context. 

Also among these probes is the so-called "time-delay 
method", first proposed by Refsdal (1964) to measure cosmo- 
logical distances independently of any standard candle. In prac- 
tice, the method uses strongly lensed quasars with significant 
photometric variability. Photons emitted by the source quasar 
propagate towards us along different optical paths, resulting in 
multiple images. The light travel times associated to these im- 
ages differ due to (i) the different path lengths, and (ii) the dif- 
ferent Shapiro delays induced by the gravitational field of the 
lensing galaxy. As a consequence, the same quasar variability is 
seen with distinct time shifts in the light curves of the multiple 
quasar images. This paper presents methods to infer the relative 
time delays between the quasar images, from such resolved, i.e. 
unblended, light curves. 

Measured time delays, in combination with deep HST imag- 
ing and dynamical information on the lensing galaxy lead to 
competitive measurement of the Hubble constant Ho (e.g., Suyu 
|et al.|2 009 2010). The complementarity between quasar time de- 
lays and several other cosmological probes has been illustrated 



recently by Linder ( 201 1) who points out that the dark energy 
figure of merit of a combination of Stage III experiments is im- 
proved by a factor of 5 if 150 quasar time delays are added. This 
also holds if the Universe is not assumed to be flat. It is notewor- 
thy that adding this time delay information is very cheap com- 
pared to other Stage III or IV projects. 

The COSMOGRAIL collaboration has now gathered almost 
a decade of photometric points for about 30 lensed quasars. With 
such data, the time delays can in most cases clearly be seen "by 
eye". The data analysis is not anymore about sorting out which 
time delay is the best among several plausible yet incompatible 
possibilities, but rather to perform an accurate measurement of 
the delay that can be reliably used for cosmology. New curve 
shifting techniques must be devised to extract the delays from 
such curves, which sometimes include a thousand points and 
typically display substantial microlensing variability due to stars 
of the lensing galaxy. 

In this paper we present three independent curve shifting 
algorithms that can deal with extrinsic variability. Our motiva- 
tion behind the development of several techniques is to provide 
a range of methods that rely on different principles. While the 
methods might not be free of systematics, we expect them to be 
biased in different ways, and we devote a large part of this work 
to the estimation of comprehensive error bars. Comparing the 
results from different curve shifting techniques will allow us in 
particular to systematically cross-check our quantification of the 
biases. 

Our paper is structured as follows: Section [2] gives an 
overview of features to be expected in light curves, most of them 
complicating the time delay extraction problem. We then present 
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the point estimation formalism that is common to our curve shift- 
ing techniques in Section [3] Sections [4] to [6] describe the three 
techniques, and we explain how we consistently compute error 
bars for each time delay and technique in Section|7] We confront 
our techniques as well as the associated uncertainty estimates in 
SectionJH] using a set of simulated light curves with known time 
delays. Finally, we present a summary and our conclusions in 
Section[9] 

2. Light curves of lensed quasars 

The COSMOGRAIL monitoring program obtains decade-long 



light curves from 1-2 meter class telescopes (e.g., Courbin et al. 



201 1 ). This data is reduced in a homogeneous way using decon- 
volution photometry. In the following we enumerate properties 
and effects that will or might be observed in light curves from 
such ground based optical observations. 

1. Sampling and season gaps: the data have irregular 
sampling, spaced on average by 3-4 days for typical 
COSMOGRAIL curves. By construction of the light curves, 
all quasar images of a lensing system are observed at the 
same epochs. The sampling can show some amount of pe- 
riodicity on the scale of one day, as the targets tend to be 
observed always at optimal airmass. Almost all light curves 
are also affected by gaps of 2 to 5 months, corresponding to 
a period of the year where the lens is not observable. 

2. Time delays: by definition, the time delays produce a time- 
shifted version of the original variability curve of the source 
quasar. These delays range from hours to years. Depending 
on the length of the delays and the non-visibility gaps, the in- 
trinsic light curve of the quasar source may be fully or only 
partially sampled by the observations. The size of the "over- 
lap" regions, in which several quasar images follow the same 
intrinsic source variability, strongly varies from one lens to 
another. For delays of roughly half a year (modulo one year), 
the observing seasons from one light curve fall into the gaps 
of the other one. This certainly exacerbates, and sometimes 
prohibits, the delay measurement. 

3. Macrolensing image flux ratios: the different strong lens- 
ing magnifications, as well as possible absorption by the 
lens galaxy or lensing perturbations by its satellites, yield 
stationary flux ratios between the lensed quasar images. As 
the light curves are usually manipulated in magnitude scales, 
this translates into magnitude shifts between the curves. 

4. Variable microlensing: stars moving in the lensing galaxy 
act as secondary lenses that induce independent flickering of 
the light curve of each image. While this effect is interesting 
in itself as a tool to zoom on the lensed quasar and/or to probe 



the mass of these microlenses (see, e.g., 


Refsdal et al. 2000 


Wambsganss et al.|2000 Eigenbrod et a 


l.|2008a||Eigenbrod 


et al.||2008b|and, for a short review, Kochanek et al.|2007 ), 



Microlensing variations can occur on a broad range of time- 
scales. We will refer to slow microlensing when speaking 
about any extrinsic variability that happens on time scales 
significantly larger than the intrinsic variability of the quasar. 
In extreme cases, microlensing can dominate the intrinsic 
variability of the quasar at all observable scales, preventing 
us from measuring time delays (e.g., Morgan et al.|2012) . 



Variable source structure: the light magnification by mi- 
crolensing depends on source structure and size, i.e., micro- 
caustics due to stars may occur on spatial scales compara- 
ble in size with the quasar. In other words, in each source 



image, microlensing predominantly magnifies different parts 
of the source. As the total intrinsic luminosity of the quasar 
fluctuates, the light emitting region might physically change 
in shape and size, on the same time scales. This can intro- 
duce mismatch between the light curves, that correlates with 
the intrinsic variability of the quasar, or motion of its com- 
ponents ( |Schechter et al.|2003) >. In particular, intrinsic vari- 
ability patterns might be seen with different amplitudes in 
the light curves (see Barkana 1997 also for a curve shifting 
method that tackles this issue). 

6. Spurious additive flux: the photometry of the quasar im- 
ages might suffer from light contamination by the lensing 
galaxy, or by the lensed images of the quasar host galaxy, re- 
sulting in constant additive shifts influx (not in magnitude) 
in the light curves. If in addition the photometric points are 
obtained from different telescopes or instruments, i.e., differ- 
ent resolutions, filters and CCDs, these flux shifts might well 
be different for each setup. 

7. Flux sharing: this occurs in narrow blends of quasar im- 
ages. The effect is due to the limited ability of photomet- 
ric methods in separating the flux of individual images in 
such blends: while the total flux of the blend is very well 
measured, one observes random transfer of flux between the 
components, leading to measurable anticorellated "noise" in 
the light curves. This problem is accentuated by bad seeing 
conditions. 

8. Photometric calibration errors: positively correlated 
"noise" between the light curves, due to noise/inaccuracies 
in the photometric normalization, i.e. magnitude zero point, 
of each observing epoch. This normalization is carried out 
using stars in the field of view. Small variability of some of 
the considered stars, as well as color terms that are unac- 
counted for, can contribute to errors in the relative flux cal- 
ibration of the CCD frames. Such correlated noise, and also 
the negatively correlated noise described under point 7, are 
particularly problematic when attempting to measure time 
delays shorter than the typical light curve sampling intervals. 

None of these effects is anything new. They affect all past 
and present optical monitoring programs. However, their signif- 
icance increases with the quality of the data. 

When considering only points 1 to 3 of the above, the prob- 
lem of extracting time delays from noisy light curves is easy to 
formulate, as it literally corresponds to "curve shifting" along the 
time and magnitud axes. A large variety of methods have been 
proposed to tackle the problem, from cross-corelations to simul- 
taneous model fits. |Hirv et al. ( |201 \) provide an overview of the 
different existing approaches, and present an algorithm based on 
the optimal prediction technique by |Press et aX] ( |1992[ ). Recent 
works focusing on the statistical tools include a bayesian estima- 
tion scheme ( |Harva & Ra ychaudhury 2008 ), and a kernel-based 
approach combined with an evolutionary algorithm ( |Cuevas-| 
|Tello et al.|20T0l >. 

Only few of the existing techniques address the problem of 
extrinsic variability due to microlensing, or at least acknowledge 
this variability in their time delay uncertainty estimation. This 
can be attributed to the lack of long light curves of sufficiently 
high quality to clearly exhibit extrinsic variability. If incorpo- 
rated, models for microlensing variability were kept very sim- 
ple (e.g., linear trends). A notable exception is the method of 
Morgan et al. (2008), which uses a physical microlensing model 
in a bayesian formalism dKochanek 2004). However, the latter 
has a high computational cost, prohibitive in the case of decade 
long curves of quadruply lensed quasars. 
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The methods presented in the following sections share a 
pragmatic approach regarding the mathematical representation 
of extrinsic variability. In this paper we are not interested in 
any modeling of microlensing but rather want to minimize and 
estimate its effect on the time delay measurement. Our meth- 
ods should not be used to evaluate the odds of mutually exclu- 
sive time delay measurements. They are designed to accurately 
measure delays within narrow ranges around uncontroversial ap- 
proximative solutions. 

Inconsistent delay measurements often result from insuf- 
ficient data. A noticeable example of such a situation is the 
decade-long controversy about two competing value of the time 
delay of Q 0957+561 ( [Walsh et al.|1979), measured in the opti- 
cal (Af = 415 ± 20 days) by |Vanderriest et al.|l|1989) and in the 
radio (Af = 5 13 + 40 days) by |Lehar et al.|H1992p rhe debate 
was closed by |Kundic et al.| ( |1997| l using additional photomet- 
ric measurements, as opposed to refined methods. A more recent 
example is the delay measurement of HE 1104-1805 (see, e.g., 
|Gil-Merino etaL"|2502] |Pelt et al.|2002| >. In our opinion the re- 
liability of time delay measurement techniques has often been 
overestimated, especially when insufficient data could not hint 
at the presence of potential extrinsic variability. 



3. The time shift formalism 

All three curve shifting techniques presented in this paper are 
point estimators; they can be seen as functions that take n light 
curves as input (n > 2, usually 2 or 4) and return n corresponding 
time shifts r, one for each curve. These optimal time shifts min- 
imize the mismatch between the common intrinsic quasar vari- 
ability features of all the light curves. Pelt et al. ( 1998) follow 
a similar approach, but attribute a shift of zero to an arbitrarily 
chosen curve. Assuming a very robust time shift optimization 
algorithm, such as a brute force exploration for all parameters, 
their formalism is equivalent to ours. Of course, the individual 
time shifts are not informative, but they directly translate into un- 
ambiguous time delay estimations between each pair of curves: 



Af X Y = t\ - Tx 



(1) 



Hence, time shift estimates simultaneously obtained from n light 
curves of a lens system yield n(n - l)/2 dependent - and con- 
sistent - time delay estimations. Note that this would naturally 
generalize to probability distributions instead of point estimates. 

By construction, this trivial time shift formalism avoids the 
selection of any reference curve with respect to which n inde- 
pendent delays would be expressed. This is crucial as it can well 
be that, for example, strong extrinsic variability in quasar image 
A prevents us from measuring the delays Af AB and A* AC , while 
the delay A?bc can be well determined, as observed in the case 
of HE 0435-1223 ( |Courbin et aLpOTT) . Furthermore, methods 
complying with this formalism shift all four light curves of quad 
lenses simultaneously. This is a strong advantage especially in 
presence of extrinsic variability, as using all curves constrains 
the intrinsic variability much better than a pairwise processing. 

Importantly, we will also exploit the common formalism of 
our point estimators by estimating their variance and bias in ex- 
actly the same way (Section^. 

Before describing the three methods, we underline that our 
point estimators all rely on iterative nonlinear optimization al- 
gorithms. As a consequence, they all show a certain amount 
of dependence upon the choice of intial guesses for the time 
shifts. For each lens system to be analysed, we systematically 
evaluate this dependence by running our methods a few hundred 



times on the exact same observed light curves, starting from ini- 
tial shifts randomly selected in a range of generally +10 days 
around a plausible solution. We call the variance of the resulting 
monomodal distributions of time delays the intrinsic variance 
of a delay estimator applied to the particular set of curves. We 
will illustrate this in Section|4] In principle this intrinsic variance 
can be made arbitrarily small, by increasing the robustness and 
precision of the optimizations. In practice, a compromise with 
CPU cost has to be found. We have implemented our methods 
so that their intrinsic variance is significantly smaller than the 
other sources of error. In any case, the intrinsic variance will be 
part of the total uncertainty evaluation. Furthermore, we will al- 
ways use the mean of these distributions as our best time delay 
estimations between observed light curves. 



4. Method 1 : simultaneous spline fit 

Our first method fits a single continuous model to all data points 
of the light curves, simultaneously adjusting time and magnitude 
shifts between these curves so to minimize a x 2 fitting statistic 
between the data points and the model. We will designate this 
common model as intrinsic, even when in practice microlens- 
ing might prevent us from getting access to the pure intrinsic 
variability of the source quasar. The idea of fitting such a single 
model to shifted light curves defines a whole family of existing 
time delay measurement methods. These techniques differ by the 
mathematical representation of the intrinsic curve; for instance, 
|Press et aL ( 1992 1 use a Gaussian proces s, Lehar et al. ( 1992[ > use 
Legendre polynomials, Barkana (1997) uses cubic splines with 
equidistant knots, Burud et al.| ( |200lT use a regularized numer- 
) et al.l (120061) use a line 



ical model, |Cuevas-Tello et aL] p0 06 ) use a linear combination 
of Gaussian kernels, and Vakulik et al.| ( |2009] > use a linear com- 
bination of sine functions. 

In presence of independent "extrinsic" variability such as 
slow quasar microlensing, light curves will not adequately over- 
lap for any shifts in time and magnitude. It is easy to conceive 
that mismatch of this type can lead to strongly biased time de- 
lay estimations. It is therefore mandatory to explicitly model the 
extrinsic variability, to the price of an increased number of free 
parameters. For example, to represent the microlensing in their 
high quality light curves of the lensed quasar HE 0435-1223, 
Kochanek et al. (2006) use independent quadratic polynomials 
for each season. 

Models representing the relative extrinsic variability act sim- 
ilarly to high-pass filters on the data. If these models are as 
flexible as the intrinsic curve, they can compensate for wrong 
time shifts, and the information of the time delay is lost. This 
is our main motivation in proposing a new method that tries to 
locally adapt the flexibility of the models to the peculiarities of 
the curves. 

4.1. Free knot splines: the principle 

We use so-called free knot B-splines to represent both the in- 
trinsic and the extrinsic variability of the light curves. A spline 
is a piecewise polynomial function, and its knots are the loca- 
tions where the polynomial pieces connect. We will only con- 
sider splines of degree 3, i.e., with continuous second derivatives 
all across the curves. For these free knot regression splines, not 
only the polynomial coefficients but also the knot positions are 
seen as free variables to be optimized. These splines yield sig- 
nificantly better fits than uniform splines, for a given number of 
knots. Furthermore, they do not introduce any arbitrary discrete 
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Fig. 1. Illustration of the free knot spline technique. The data points are shifted COSMOGRAIL light curves of the quadruply lensed 
quasar HE 0435-1223, published in |Courbin et al.| ( |201 1[ ). A reasonable spline representing the intrinsic variability is shown in 
black (initial knot step 77 = 30); knots are shown as vertical ticks. The red, green, and violet splines represent the relative extrinsic 
variability corrections, applied to the observed light curves A, C and D (respectively), so that they match to the common intrinsic 
spline. These extrinsic splines are plotted relative to the dashed grey line. The optimization starts with uniformly distributed knots; 
note how the knots tend to migrate to areas in which they are required by well constrained patterns of the data. The blue and orange 
curves are alternative intrinsic splines (shown without knots to avoid clutter) with inadequate knot densities of rj — 50 and 10 (see 
text). 



grid in the model, which is of high importance for our applica- 
tion. Ideally we want our models to be shift-invariant in terms of 
their ability to fit any given pattern. 

In the case of fixed knots, finding the unique least-squares 
spline approximation to some data points is a linear problem. 
This property does not hold for free knot splines: optimizing the 
knot positions to minimize the x 1 requires nonlinear parameter 
estimation. The nonlinear optimization is particularly difficult, 
as the motion of the knots leads to many local optima and sta- 
tionary areas in the parameter space. 

Molinari et al.| (2004) present an efficient algorithm named 
"Bounded Optimal Knots" (BOK) to optimize the knot locations 
of least-squares spline approximations. The authors recall that 
fitting a free knot spline is a problem that can be separated in 
a linear and a nonlinear part (Golub & Pereyra 1973). For any 
given knot configuration, the computation of the corresponding 
optimal spline coefficients remains linear, and thus fast - in our 
case using scipy" 



Jwrappers around FITPACK ( Dierckx 



1995) 



The main idea of BOK is to wrap the linear coefficient compu- 
tation inside an iterative bounded optimization of the knots. The 
bounded optimization guarantees a well-defined minimal dis- 
tance between the knots, by keeping them confined into disjoint 
windows. This scheme avoids the "coalescence" (i.e., superpo- 
sition) of knots, that would correspond to unwanted discontinu- 
ities in the derivatives of the spline. Following an idea of the 
Evolutionary BOK algorithm (also described by Molinari et al. 
2004| we update the bounds of the windows once the knot lo- 



cations have been robustly optimized, and iteratively repeat the 
process. This mechanism effectively pushes the window bounds 
to follow their knots, yet always ensuring a minimal knot dis- 
tance. 

Fitting a single spline to fixed data points is only a fragment 
of the curve shifting problem; our model for the light curves 
not only consists of a common intrinsic spline, but also of sev- 
eral independent extrinsic splines. In addition, to make our curve 



1 Scientific Tools for Python |Oliphant|2007 1 
|http : / /www . scipy . org/| 



shifting technique efficient, we adjust the time shifts between the 
curves simultaneously with the splines, instead of performing in- 
dependent fits for different trial delays. In mathematical terms, 
we aim at finding the global minimum of 



X 



Y^Y^ ['"17 - s(tij + Td - Mtij)] 2 



(2) 



''=1 7=1 



in which n is the number of light curves with Nj photometric 
points (tip mij±crij), Ti are the time shifts, s is the intrinsic spline 
and /u, are the extrinsic splines. 

We minimize this x 2 m an iterative process, in which the 
splines and the time shifts get optimized one after the other, us- 
ing custom strategies for each parameter. Several formal diffi- 
culties arise as the "footprint" of the mixed data points evolves 
when the time shifts are modified; we have for instance to stretch 
the knot locations so that they follow the extent of a spline's 
domain. For details of the admittedly intricate but modular op- 
timization procedure, we refer the reader to the documentation 
accompanying the source code. 

4.2. Free knot splines in practice 

In Fig.[T] we show 3 seasons of a spline fit obtained for the light 
curves of the quadruply lensed quasar HE 0435-1223 (Courbin 
et al.|201 1) . The intrinsic spline, going through the shifted data 
points, is shown in black. The red, green, and purple splines il- 
lustrate extrinsic variability for which the data points have been 
corrected (see caption). Vertical ticks across the splines indicate 
the knot epochs. 

Each spline is parametrized by these knot epochs, as well 
as the associated coefficients. The curve shifting starts with 
equidistant knots, and flat splines. Before running the optimiza- 
tion, one has to chose a number of knots for the intrinsic spline 
(e.g., one knot every 30 days, rj = 30), the number of knots 
for each extrinsic spline (e.g., one knot every 150 days, depend- 
ing on the apparent microlensing variability in each individual 
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Fig. 2. Distributions of time delays obtained by running the free knot spline technique on always the same data, starting from 
randomized initial time shifts. The light curves are from Cour bin et al.| l |20l"T) , an excerpt is shown in Fig.[T] The colors encode the 
initial knot step rj of the intrinsic spline, spanning a large range corresponding to a factor 5 in the number of knots. Superposed to 
the histograms are scatter plots of the minimal x 2 values obtained by the optimization. We stress that the histograms shown here are 
not to be mistaken for probability density functions of time delay measurements on HE 0435-1223. They only illustrate the intrinsic 
variance, i.e., the finite precision of the optimization algorithm, as applied to a high quality data set. 



curve), and the respective miminum knot distances, e (e.g. 10 
days). These numbers remain unchanged by the fitting; they con- 
trol the global flexibility of each spline. 

How sensitive is the technique to these choices ? In Fig.|2]we 
illustrate the effect of the initial uniform knot step r\ of the intrin- 
sic spline on the resulting time delay measurements; a smaller 
step corresponds to a larger number of knots. These time delay 
histograms are obtained by repeatedly running the optimization 
on the same data, starting from random initial time shifts, uni- 
formly distributed within a range of ±10 days around the typ- 
ical best fitting solutions. We also randomly shuffle the order 
in which the optimizer processes the microlensing splines, to 
marginalize over any possible asymmetry introduced by our iter- 
ative spline fitting algorithm. For each number of knots, the dis- 
tributions of measured time delays display a finite scatter. This 
is the intrinsic variance introduced in Section [3] It depends on 
the robustness of the x 1 minimization algorithm, and thus also 
on the complexity (degeneracies, local minima) of the^ 2 hyper- 
surface. As we observe, the result of our optimization is by no 
means global. A visual inspection of the splines reveals that the 
knots tend to settle in a few different repeating configurations. 
This intrinsic variance will naturally contribute to the error bar 
that we attribute to a time delay measurement. 

The influence of the step rj on the centroids of the mea- 
sured delays in Fig.|2]is surprisingly small, considering that the 
range of tested values covers a factor 5 in the number of knots. 
Visualizing the intrinsic spline fits (3 examples are shown in Fig 
[TJ, an initial step of rj — 50 days is clearly too large for this 
particular data set, as the resulting spline is not able to follow 
obvious intrinsic trends. On the other hand, setting 77 = 10 days 
yields far too many knots and the spline tends to fit the noise of 
individual curves. As expected, we observe in Fig.|2]that delib- 
erately selecting a too large number of knots leads to an increase 
of the intrinsic variance of the method. This behaviour is eas- 
ily reproduced when changing the knot density of the extrinsic 
splines. Too few knots bias the measured time delays, too many 



knots "dilute" them, due to the degeneracies with the intrinsic 
spline. Note that it is well possible to attribute an extrinsic spline 
to each light curve instead of leaving one curve as a reference; 
this leads to obvious degeneracies between the extrinsic splines, 
but it does not systematically increase the intrinsic variance. 

Finally, we note that the choice of the minimum knot dis- 
tance e of the intrinsic spline, in a range from 2 to 15 days, has 
practically no effect at all on the time delay measurements. Only 
a very low minimal distance (e < 5), combined with a high num- 
ber of knots (77 < 15) significantly increases the intrinsic vari- 
ance of the delay measurements, without introducing systematic 
shifts. In all the following, we will use a minimal knot distance 
of 10 days. 

To close this section, we compute the number of free param- 
eters involved in this technique. Every spline has a number n* 
of so-called internal knots, i.e., knots located strictly within the 
temporal range of the data points. These are the knots that are 
free in case of a free knot spline. For given internal knot epochs, 
a cubic spline is defined by + 4 independent coefficients (see, 
e.g., |Molinari et al.|2004||de Boor|1978 i. These can be seen as 
one coefficient per internal knot, plus 2 coefficients for the knots 
located at each extremity. The only other free parameters are the 
time shifts r of the curves. Let us consider the case of a quad 
lens with 500 monitoring epochs spread over 6 seasons, = 75 
knots for the intrinsic spline, and = 15 knots for each of the 
4 extrinsic splines. This corresponds to 2000 data points, and 
(2 x 75 + 4) + 4 (2 x 15 + 4) + 4 = 294 free parameters. Our 
implementation of this simultaneous fit converges in less than a 
minute on a single ordinary CPU. 

We postpone to Section [7] the assessment of the total uncer- 
tainties of delay measurements. 

5. Method 2: variability of regression differences 

Our second curve shifting technique consists of a much less 
parametric approach, and can be summarized as follows: 



5 



Tewes et al.: COSMOGRAIL XI - Techniques for time delay measurement in presence of microlensing 



2005 



2006 



2007 




53200 



53400 



53600 53800 

HJD - 2400000.5 [day] 



54000 



54200 



Fig. 3. Illustration of the regression difference technique, on light curves of HE 0435-1223 ( |Courbin et al.|201 lb. For clarity, only 
the light curves of images A (red) and B (blue) are shown. The Gaussian Process Regressions are shown as red and blue continuous 
mean functions and lcr envelopes. Curve B has been shifted in time with respect to A so to minimize the weighted total variation 
of the A - B difference curve, shown in black. The brown difference curve was obtained by deliberately shifting B by 15 days with 
respect to its optimal position. The curves have been arbitrarily offset in magnitudes for display purposes. 



1. Instead of simultaneously fitting one intrinsic model to all 
light curves, we start by performing an independent regres- 
sion on each individual curve. We can then easily express nu- 
merical difference curves between time shifted pairs of these 
finely sampled regressions. As we work with light curves in 
magnitudes, these difference curves correspond to flux ra- 
tios. 

2. The regression curves are shifted along the time axis so to 
minimize the variability, i.e., structures, of the difference 
curves. In mathematical terms, we propose to measure this 
variability through the "weighted average variation", a con- 
cept inspired by the total variation. This approach minimizes 
the derivative of the difference curves, as opposed to the dif- 
ference curves themselves. 

This method is illustrated in Fig. [3] Regressions of the light 
curves of two quasar images are shown in red and blue. If the 
regressions are shifted in time so to correctly compensate for the 
delays, any intrinsic variability pattern of the quasar cancels out 
in the difference light curves (black curve of Pig . [3]> . In this par- 
ticular situation, the difference curves contain only the relative 
extrinsic variability between the curves. If the absolute extrinsic 
variability is independent for each curve, any statistical prop- 
erty of the relative extrinsic variability between the curves will 
a priori be independent of the time shifts. In practice this is not 
entirely the case, due, e.g., to the finite length of the light curves, 
which is much shorter than the largest time scales of microlens- 
ing variability. 

In contrast, if the regressions are not shifted by the correct 
time delays, the intrinsic variability does not cancel out. As a 
consequence, the difference curves also contain a finite differ- 
ence of the intrinsic quasar variability. If the latter is fast and 
strong enough, this corresponds to clear additional irregularities 



in the difference curves. These can be observed in the brown dif- 
ference curve of Fig. [3] that was obtained by attributing wrong 
time shifts to the regressions. 

5.1. Gaussian Process Regression difference curves 

The purpose of the regression is to allow the measurement of 
relative variability between time shifted light curves, despite the 
highly irregular sampling and season gaps. To adequately weight 
localized variability according to its uncertainty, we need a re- 
gression that expresses not only a most likely value but also a 
confidence interval for this value at each epoch. 

Gaussian Process Regression (GPR) is a powerful formal- 
ism whose predictor curve, a Gaussian process, does not fol- 
low a predetermined parametrized form. Instead, the regression 
curve is constrained in its freedom by a covariance function, that 
describes how correlated two given points of the curve are, de- 
pending on their separation along the time axis. Given (i) such 
a prior covariance function, (ii) a prior for the regression func- 
tion itself, and (iii) the observed data, the GPR yields a Gaussian 
distribution (i.e., a mean value and a variance) for the regression 
value at any interpolation epoch. For a pedagogical introduction 
to GPR, see e.g. |Press et aIJ ( [2TjO"7] l. 

To implement our curve shifting technique, we make use 
of the GPR functionality provided by the pvmcrl python pack- 
age ( |Patil et al.|2010| l. Before computing a regression, we have 
to choose priors for both the covariance and a mean function. 
For the latter, we simply use an uninformative constant func- 
tion, at the mean magnitude of the curve's data points. The 
choice of a prior covariance function is less trivial and certainly 
not unique. Several families of covariance functions are imple- 

2 http : //pypi . python . org/pypi/pymc/ 
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merited in pymc; in all the following we make use of the Matein 
family, with an amplitude parameter of 2 magnitudes, a scale of 
200 days, and a smoothness degree v - 1.5. We observe that this 
choice yields good results for several different COSMOGRAIL 
data sets, in terms of the empirical time delay uncertainties de- 
termined in Section [7] But a priori, these parameters can be fine 
tuned individually for each lens system to be analysed, using this 
same criterion. When experimenting with covariance functions, 
it is of particular importance to ensure that the season gaps are 
interpolated with adequately large variances. 

In practice, for each of the n light curves, we evaluate the 
GPR every 0.2 days. Given some trial time shifts, we express 
the n (n - l)/2 difference curves by subtracting linearly interpo- 
lated magnitudes of the shifted regression curves. Indeed, each 
pair of curves has to be considered only once; for the variabil- 
ity analysis, the difference curve A - B yields the same result as 
B — A. In a similar way, the uncertainties at each epoch of the 
difference curves are obtained by summing the linearly interpo- 
lated variances. We procede by quantifying the variability of the 
difference curves. 



5.2. Minimizing the Weighted Average Variation 

To measure the variability of a difference curve, we define a sim- 
ple scalar statistic, that we will refer to as the Weighted Average 
Variation (WAV). It corresponds to an average absolute value of 
the discrete derivative along the difference curve. We weight the 
terms of this average by the local uncertainty of the difference 
curve, and normalize this weighting so to cancel the influence 
of the varying size of the overlapping regions. For a difference 
curve f(tj) with variance o~ 2 (tj), and N regularly sampled points 
(j - 1 , . . . , AO, the WAV is given by: 



WAV(/) := 



\f'(tj)\ ■ HJ) 



where 



f'itj) = 



f(t J+1 )-f(tj) 



tj+i 



- t; 



and the weights are given by: 



w(j) = 



o-(tj) + aifj + i) 



(3) 



(4) 



(5) 



The time shifts of the regression curves are optimized using 
a nonlinear technique that minimizes a single scalar objective 
function obtained by summing the WAV of all pairwise differ- 
ence curves. Due to the low number of parameters (n time shifts) 
compared to the free knot spline technique, this optimization can 
be made very robust, i.e., leading to a negligible sensitivity to 
initial conditions. The total computing time for a quad lens with 
500 epochs in each curve is of the order of one minute on a single 
CPU. The GPR takes more than half of this time. 



6. Method 3: dispersion minimization 

Our third curve shifting method is broadly inspired by the dis- 



1996). In |Courbin eTaT. 



persion techniques from Pelt et al. 

( |201 l| l, we have already applied this particular time delay es- 
timator to the light curves of HE 0435-1223. It has also been 
used in Eulaers & Magain |( |201 1) . A notable difference to classi- 
cal dispersion techniques is that we make use of a simple linear 



"c 



X 
Y 



m Y (t,.) ± a Y (U) . 



"lX,interp(*i) ± CX,interp(^) 



Heliocentric Julian Date 

Fig. 4. Illustration of the "elementary" dispersion function used 
in the curve shifting technique described in Section [6] The ver- 
tical gray bars represent the terms of the summation of equation 
[6] Note that the last shown point of Y would not contribute to 
the dispersion, as it falls into a season gap of X. This elementary 
dispersion function is not invariant with respect to swapping the 
curves X and Y. However, our total dispersion estimate is sym- 
metric, as we average these elementary dispersion accross all 
permutations of 2 curves among n. Not shown in this sketch are 
the polynomial corrections for extrinsic variability. These cor- 
rections are optimized against the same total dispersion. 



interpolation to form pairs of predicted and observed points, in- 
stead of considering only pairs of closeby observed points. 

The method consists of a nonlinear optimization of the time 
shift of each light curve so to minimize a single scalar objec- 
tive function that quantifies the "mismatch". Dispersion func- 
tions are such objective functions that do not involve any model 
for the intrinsic variability, but only use the relative dispersion 
between the time-shifted points of the light curves. As for the 
free knot spline method (Section [4}, it is necessary to explicitly 
compensate for any slow extrinsic variability before evaluating 
the dispersion for given trial time shifts. We represent this extrin- 
sic variability by low order polynomials added to either the full 
curves or to individual seasons (for an illustration, see Fig. 5 of 
Courbin et al.|20lT) . These polynomials are optimized simulta- 
neously with the time shifts, so to minimize the same dispersion 
function. 

Remains to define the dispersion function. It should be able 
to evaluate the mismatch between n > 2 time-shifted light curves 
of a quasar lens, treating all those n curves equally, i.e., without 
choice of an arbitrary reference curve. We achieve this by first 
defining an elementary dispersion function that operates on pairs 
of curves, and then averaging its value over all n(n - 1) permuta- 
tions of 2 among n light curves. This elementary dispersion func- 
tion between two curves X and Y is illustrated in Fig. |4] We lin- 
early interpolate (and never extrapolate) the light curve X at each 
epoch of Y, provided that the time interval over which this inter- 
polation is done is smaller than 6 = 30 days. Given the typical 
sampling of light curves, this means that we do not interpolate 
across observing season gaps. We then compute a sum of square 
differences between the interpolated magnitudes mx,interpft) and 
the corresponding magnitude measurements m Y (ti) of light curve 
Y: 



('«X,intei-pO/) - m Y (ti)) 
4,interp( f <) + 4^) 



(6) 
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where the summation goes over the N epochs of Y for which the 
interpolation of X was performed, given the above criteria. Each 
term is weighted by a combination of the linearly interpolated 
uncertainty on mx,interp(*;) an d the uncertainty of niy(ti). 

For a set S of n lightcurves, e.g. S = {A, B, C, D), the average 
dispersion is computed by: 



D 2 (S) = 



1 



n(n - 1) 



X D 2 (X,Y) 



(7) 



[X,Y}c S,X±Y 



This dispersion is a function of the time shifts and the extrinsic 
variability corrections of each light curve. In practice we mini- 
mize the average dispersion by using iteratively a simulated an- 
nealing optimizer for the time shifts and a Powell optimizer for 
the extrinsic variability, as implemented in scipy (for a descrip- 
tion of the algorithms, see e.g. Press et al. 2007 chap. 10.12 and 
10.7). 

Due to the sampling and scatter of the light curves, the dis- 
persion function usually has a very rough structure in time shift 
space, resulting in a poorly defined minimum. It is tempting to 
"smooth" this dispersion function. This can be done for instance 
by using a regression instead of a simple linear interpolation, or 
by determining the dispersion minimum through a local fit (see 
e.g. Fig 5. of |Vuissoz et al.|20 08 ). Such measures clearly reduce 
the intrinsic variance (see Section[3]l of the estimator by increas- 
ing its stability against random initial time shifts. But we also 
observe that they often introduce additional bias. We therefore 
prefer to simply use the dispersion function described above, 
tackling its "noisy" minimum with a robust optimization algo- 
rithm. 

7. Empirical estimation of time delay uncertainties 

As described in Section [3] the three curve shifting techniques 
presented in this paper can be seen as recipes that yield a single 
time shift estimation t for each light curve of a lensed quasar. 
In this section, we describe how we proceed to empirically ob- 
tain reliable error bars for time delays between pairs of curves 
(A?xy = Ty — Tx), as measured through these point estimations 
r. This error analysis is done individually for each quasar lens, 
i.e., for each set of observed light curves. 

To compute the time delay uncertainties, we follow a Monte 
Carlo approach. We apply the curve shifting techniques on a 
large quantity of synthetic light curves with known time delays. 
This allows us to assess both the variance and the bias of our 



point estimators, as we relate in Section 7.3 We draw these syn- 
thetic light curves from a comprehensive "generative" model, 
i.e., a mathematical model for randomly generating mock light 
curves from scratch, while controlling the hidden parameters 
such as the time delays. For a given lensed quasar, our generative 
model - whose details are described in the following sections - 
is composed of: 

1. A single intrinsic variability curve, common to all n quasar 
images. We use the intrinsic free knot spline obtained by ap- 
plying our first curve shifting method (Section|4] black spline 
of Fig.[T]i to the observed light curves. 

2. A smooth extrinsic variability curve ("slow" microlensing) 
for each curve of the lens. Again, we directly use the curves 
obtained by the spline fitting technique on the observed data 
(red, green, and purple splines of Fig. [TJ. 

3. An independent "fast" extrinsic variability, which can be 
seen as correlated noise, for each curve. This contribution 
is randomized, i.e., individually drawn for each realization 
of the synthetic curves. 



We always sample from this generative model at the actual 
observing epochs of the monitoring. As a result, we obtain syn- 
thetic curves that closely imitate the intrinsic and extrinsic vari- 
ability, sampling, and scatter characteristics of the real data. If 
the properties of the randomized fast extrinsic variability are 
well adjusted, these synthetic curves are statistically undistin- 
guishable from the observations. As illustrated in Fig. [5] they 
could easily be mistaken for the real light curves. Yet, they have 
known time delays. 

Our first objective of this methodology is to obtain synthetic 
curves that share a very similar "time delay constraining power" 
with the observations, so that we can assume our delay measure- 
ment methods to perform equally well or badly on both. Clearly, 
this constraining power of a set of light curves increases with the 
amount of intrinsic variability, and decreases as this variability 
gets diluted by extrinsic effects such as microlensing, sampling, 
and noise (Eigenbr od et al.| [2005). Furthermore, curve shifting 
methods might not be able to optimally exploit the information 
content of the data points. In particular, they are likely to be bi- 
ased even by those peculiarities of the data that can be indu- 
bitably determined from the observations, such as the interplay 
between large scale extrinsic and intrinsic variability, and sea- 
son gaps. Hence, our second objective in generating synthetic 
curves that mimic as closely as possible the real data is to reveal 
these biases. Provided that a rough but unambiguous estimation 
of time delays can be made, the observed light curves do yield 
some nearly unmistakable information about the intrinsic and 
extrinsic variability. We use this information as a deterministic, 
smooth part of our generative model, and marginalize over the 
unkown true time delays and short scale extrinsic variability. We 
proceed by describing the generative model in detail. 

7. 1 . Model for the intrinsic and slow extrinsic variability 

We directly use the free knot spline fits of our first curve shifting 
technique as a model for the intrinsic and slow extrinsic variabil- 
ity. While quasars frequently show variability on scales of hours 
(see e.g. Stali n et al.|2005| l, we do not add any additional small 
scale variability to the intrinsic spline obtained from the data, 
as this could exaggeratedly increase the time delay constraining 
power of our synthetic curves. 

We observe that the shapes, amplitudes, and the slopes of 
the intrinsic and extrinsic splines are virtually insensitive to any 
plausible time shifts of the light curves around the estimated time 
delays. Therefore, we simply use the set of splines from our free 
knot spline technique as a fixed, deterministic part of our model. 
We have verified that our results do not significantly change if 
we perform a time-shift-constrained spline fit on the observed 
curves for each particular delay that we want to simulate. 

7.2. Model for the randomized fast extrinsic variability 

To add fast extrinsic variability to our synthetic light curves, we 
randomly generate "power law noise", i.e., a time series whose 
Fourier spectrum follows a power law. The characteristics of this 
noise are adjusted individually for each quasar image. This pro- 
cedure can be summarized as follows: 

1 . For each quasar light curve, draw power law noise following 



Timmer & Koenig (1995), using a fine regular sampling of 



e.g. 0.2 days. 

Linearly interpolate these finely sampled signals at the ob- 
serving epochs of the light curves, to obtain a noise contri- 
bution for each data point. 
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Fig. 5. Synthetic curves (black) drawn from a well adjusted generative model (see text), and the corresponding observed data points 
for the lensed quasar HE 0435-1223. For illustration purposes only 3 seasons of 2 quasar images are shown, arbitrarily offset 
in magnitude. The purpose of the generative model is to simulate curves with known time delays that nevertheless mimic the 
observations at best. These synthetic curves are used to evaluate the accuracy and precision of our curve shifting algorithms, using 
a Monte Carlo approach. Note that the strong degeneracies between the extrinsic splines used in the model have no influence on the 
match between the synthetic and observed light curves; in the scope of our simple generative model we do not need to know if, for 
instance, a light curve is amplified by microlensing in image A, or demagnified in B. Only relative extrinsic variability is relevant. 
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Fig. 6. Residuals of the observed (red, blue) and the synthetic (black) light curve of images A and D of HE 0435-1223, as obtained 
by applying the free knot spline method. The power law noise used for the synthetic curve is adjusted so that these residuals show 
on average (i) the same standard deviation and (ii) the same number of "runs" as those of the observations. As described in the text, 
the power law noise has been rescaled to locally match the scatter amplitude of the observed curves, as can be clearly seen in this 
figure. 



3. Locally rescale the noise, so that its amplitude follows the 
scatter in the observed curves. 

4. Run the free knot spline technique on the synthetic curves, 
and analyse the residuals. 

5. Iteratively repeat the above steps, adjusting the parameters 
of the power law noise until the residuals obtained at step 4 
are statistically compatible with the residuals obtained from 
the real observations. 

Note that we use this model to generate both correlated ex- 
trinsic variability and independent shot noise at once. Therefore, 
the regular sampling used in step 1 should be chosen sig- 
nificantly finer than the minimal distance between observing 
epochs. 

Let us now revisit the steps leading to a well adjusted fast 
extrinsic variability in more detail, starting with the power law 
noise. The idea of Timmer & Koenig's algorithm is to generate 
random amplitude and phase coefficients in the discrete Fourier 
space, and then build the real signal by inverse Fourier trans- 
form. We limit the generation of random amplitudes to a finite 
window of the frequency domain, thus avoiding to affect the 



large scale variability of our extrinsic model curves. The power 
law noise is controlled by the following parameters: 

- the exponent of the spectrum power law; ft — -2 corres- 
ponds to a random walk, f3 — is white noise). 

- A: a scaling factor for the generated noise 

- fain- low cut-off of the frequency window. We set fain to 
1/500 day -1 , as any lower frequencies are by construction 
well represented by the extrinsic free knot spline fit. 

- faax- high cut-off of the frequency window. We take this as 
the maximum (Nyquist) frequency of the 0.2 day sampling. 

The free parameters /3 and A have a direct influence on the 
accuracy and precision that time delay estimators will achieve 
on the synthetic curves; these are the parameters that will be ad- 
justed for each light curve. We observe that reasonable changes 
in the parameters fain and faax (i.e., the sampling) have a negli- 
gible effect compared to the influence of (3 and A. 

Interpolating this power law noise at the observing epochs 
leads to a contribution e, for each data point. The next step is 
to locally adapt the amplitude of these e, to the observed scatter 
in the light curves. This is empirically well motivated, as ob- 
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Fig. 7. Top: histogram of residuals obtained by running the free knot spline fit technique on the observed light curves of HE 0435- 
1223 (green), and the corresponding synthetic curves (gray). Bottom: distribution of the z r parameter computed from these residuals. 
Only one set of observed light curves is available, while the distributions related to the synthetic curves are averaged over 1000 
realizations. In this example, the parameters of the generative model have already been adequately adjusted; the synthetic curves 
shown in Fig. [5] and [6] are drawn from this adjusted model. 



served light curves often contain subregions that clearly display 
different smoothness. To perform this local rescaling for a given 
curve, we use the residuals r l o b s obtained by running the free 
knot spline technique on the observed light curves (red and blue 
points of Fig.[6]l. We normalize the absolute values of these noisy 
residuals to an average of one, and smooth the resulting signal 
using a median filter with a window of 7 observing epochs: 



r / kiobsl 

s: = median - 



\<kobsl> 



,je{j~3,...,* + 3} 



(8) 



where (|r t, s |) denotes the average absolute residual taken over all 
points of the curve. The rescaled e, that we add to our synthetic 
light curves are given by: 



^/,rescale — 



(9) 



This local rescaling does not affect the average amplitude of the 
synthetic noise, which is still controlled by A. 

We proceed by running the spline fitting technique from 
scratch on the set of synthetic curves, i.e., disregarding any infor- 
mation from the generative model. This yields residuals, shown 
as black points in Fig. [6] that can be equitably compared with 
the residuals of the observations. 

To fine-tune A and /?, we quantitatively analyse these residu- 
als. For this, we make use of two simple statistics: their standard 
deviation <x, and their number of "runs" r. A run is a sequence of 
adjacent either positive or negative residuals. The total number 
of positive and negative runs is a statistic that can be used to test 



the hypothesis that successive residuals are independent (Wall & 
Jenkins 2003 chap. 5). For large samples of N truly indepen- 
dent observations with N+ positive and negative residuals, 
the number of runs r is normally distributed with an expectation 
value and a variance respectively given by 



2N + N- 
N 



+ 1 and crz 



(Jl r - l)(jXr - 2) 

N-l 



(10) 



In practice we "normalize" a measured number of runs r to this 
hypothesis of independent residuals: 



Zr = 



r- Hr 



CTr 



(ID 



Applying the free knot spline technique on COSMOGRAIL 
light curves, we typically observe a number of runs in the resid- 
uals at least about 2cr r lower than /j. r , i.e., z r ~ -2, meaning that 
the residuals are indeed correlated. Fig. [7] shows a comparision 
in terms of residual distribution and z r between the residuals ob- 
tained from synthetic curves (gray distributions, averaged over 
1000 realizations) and from the observations of HE 0435-1223 
(in green). One can now adjust the parameters A and /3 of the 
power law noise so that the average standard deviation and num- 
ber of runs of the residuals obtained from the synthetic curves 
match the values measured on the observations. The procedure is 
straightforward, as the residuals' number of runs directly relates 
to p, and their standard deviation to A, without much crosstalk. 
Hence a good match such as shown in Fig. [7J is obtained after 
just a few trial and error iterations, simultaneously modifying A 
and p of each quasar image. 

Finally, we attribute to all our synthetic light curves the pho- 
tometric error bars of the observed data. We stress that nowhere 
else in our generative model we made use of these photometric 
error bars. Photometric error bars only describe the experimen- 
tal measurement uncertainty, which is not necessarily the only 
source of short scale "mismatch" between multiple light curves. 
Furthermore, for all methods presented in this paper, the photo- 
metric error bars essentially only act as relative weights between 
the data points. As a consequence, our time delay measurements 
- including their uncertainty estimates - are not directly influ- 
enced by a potential systematic under- or over-estimation of the 
photometric errors. We see this as a very desirable feature. 



7.3. Quantifying variance and bias of our time delay 
estimators 

With the well adjusted generative model in hand, we proceed 
by drawing typically 1000 synthetic curve sets, choosing model 
time shifts in a range of several days around the point estimates 
obtained from the observations. We run the curve shifting tech- 
nique on these synthetic curves, using the same parameters as 
for the estimation perfomed on the real observations. We always 
start the techniques from random initial time shifts, to take into 
account the potential intrinsic variance of the curve shifting tech- 
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Fig. 8. The "trial curves", i.e., a set of artifical 4 season long lightcurves of a quad lens, used in place of real observations for the self- 
consistency check performed in Section [8] These curves are drawn from a generative model that closely mimics COSMOGRAIL 
observations. Note the prominent microlensing on large time scales, but also the presence of obviously extrinsic variability on scales 
of weeks (e.g., middle of third season of curve C). The true time delays are A?ab = -5.0, A?ac = -20.0 and A?ad = -70.0 days. In 
this figure the curves are shifted according to these delays. 



nique described in Section [3] We then compute the resulting er- 
rors between the estimated and the true time delays. 

To analyse these errors, we bin the synthetic curve sets ac- 
cording to their true time delays, individually for each quasar 
image pair. This allows us to check if the uncertainties that we 
will derive do not strongly depend on the true time delays. Fig. 
10 illustrates the procedure in the case of a quad lens. In each 
bin, we estimate the variance and the bias of our time delay esti- 
mators, by computing the sample variance (of^) and the average 
(cr sys ) of the delay measurement errors, respectively. 

By construction, the main practical origin of the bias has to 
be related to those properties of light curves that are kept con- 
stant in the Monte Carlo simulations, such as the interplay be- 
tween intrinsic and slow extrinsic variability, season gaps and 
sampling. This bias quantifies the accuracy of a time delay es- 
timation. Note that such a characterization of accuracy has of- 
ten been neglected in past time delay measurements; it cannot 
be obtained through resampling techniques that don't involve a 
generative model, such as jackknifing or bootstrapping. 

On the other hand, the variance of our time delay estimators 
mainly results from their sensitivity to the fast extrinsic vari- 
ability and noise, which we randomize in our synthetic curves. 
This variance describes the precision of the time delay measure- 
ments. It is often possible to increase the precision of a curve 
shifting technique, by somehow smoothing either the input light 
curves or the cost function to be optimized. Using the apporach 
described in this paper, we can verify if such an increase in pre- 
cision is not obliterated by an even higher decrease in accuracy. 

Lastly, we obtain a comprehensive 1-sigma error bar <x tot for 
the time delay measurement between each quasar image pair by 
combining the maximal bias and the maximal variance observed 
in the bins of true time delays: 



Ctot 



max CTji) 

A?Tme 



max 07, 

A'Tme 



(12) 



its bias, instead of simply combining the bias and the variance 
into the total error budget. To avoid any circular argumentation, 
we do not perform such a correction in this paper. The circu- 
larity would arise as we have to assume time delays for the ob- 
served data in order to build the generative model of the syn- 
thetic curves. If these initial guesses for time delays are signif- 
icantly biased, the generative model will contain erroneous in- 
trinsic and extrinsic variability patterns, yet still mimic the ob- 
served curves. Hence, running the curve shifting techniques on 
the synthetic curves would yield measured delays close to their 
true delays, and thus not fully reveal the initial errors. 

We prefer to simply use the amplitude of the bias, gauged 
through the above procedure, as a quality criterion of each 
method. 



8. Application to trial curves, discussion 

In this section, we apply our three curve shifting techniques, as 
well as the error bar computation procedure of Section [7J to a 
set of artifical curves with known time delays. This allows us 
to check the consistency of the error bar computation, and to 
illustrate some general observations about the latter. 



8. 1 . The trial curves 

We use the scheme described in Section [7] to generate a single 
set of 4-season long light curves, that mimic the data of one of 
the quadruply imaged quasars monitored by the COSMOGRAIL 
collaboration. In the following, we refer to these artificial curves 
as the "trial curves"; they are shown in Fig. [8] The trial curves 
include realistic intrinsic variability and sampling, as well as ob- 
vious long- and short-scale microlensing. We choose time delays 
of A?ab = -5.0, Af A c = -20.0 and At AD = -70.0 days. 



In situations where bias and variance do not significantly de- g 2 Analysis 



pend on the true time delays, crf ol simply corresponds the mean 
squared error (MSE) of our time delay estimators. Fig. [10] corre- 
sponds about to such a situation. 

As will become evident in the next section, it appears very 
tempting to empirically correct each time delay estimation for 



From here on we process the trial curves as if they would be real 
observations. We use our time delay estimators on them, disre- 
garding any prior information about the true delays and intrinsic 
or extrinsic variations. For the spline method, we choose an av- 



11 



Tewes et al.: COSMOGRAIL XI - Techniques for time delay measurement in presence of microlensing 



AB 



-6.1 ±(1.4, .1.9) 

-4.9±(;i.l, 0.4) 

I f 1 

-5.2±(lU, 1.0) 



Dispersion-like technique 
Regression difference technique 
Free knot spline technique 



-10 



AC 

-22.3 ±(1.9, 1.3)| 




BC 

-16.2 ±(1.7, ;0.7) 






-21.2 ±(1.4,10.3) 




-16.2 ±(1.3, 10.4) 






-21.3±(1.6, b.6) 




-16.1 ±(1.4, 0.5) 
















-26 -24 -22 -20 -18 




22 -20 -18 -16 -14 - 


12 




' 1 ' 1 

AD 

-69.1 ±(1.6, 0.4) 




BD 

-j-63.0 ±(1.4, 1.8) 




1 

CD ; 

| -46.8 ±(1.7, 1.4) 


-70.8 ±(1.2, 0.8) 




-65.9 ±(1.2;, 0.8) 




-49.6±(1.4, 0.7) 
i ■ j 


-68.9 ±(1.1, 0.5) 




-63.7±(1.1, 0.6) 




|-47.6±(1.5, 0.4) 


74 -72 -70 -68 -66 - 
Delay [day] 


64 


-68 -66 -64 -62 -60 
Delay [day] 





52 -50 -48 -46 -44 - 
Delay [day] 



42 

Delay [day] Delay [day] Delay [day] 

Fig. 9. Time delay estimations and associated uncertainties obtained by each of the three curve shifting techniques from the trial 
curves in Fig. [8] For each delay measurement, the random error bar, cr ran and the bias, <x sys , are given in parentheses. The drawn 
error bars depict the total error, <x tot , as obtained from equation[T2] The dashed vertical lines show the true delays of the trial curves 
analysed in this section. 
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Fig. 10. Error analysis for the trial curves shown in Fig. [s] For each curve shifting technique, the estimates of the time delay uncer- 
tainties are obtained by analyzing delay measurement errors on 1000 synthetic curves with randomly chosen true time delays. In 
each panel, the delay measurement errors (vertical axis) are represented against the true delays of these synthetic curves (horizontal 
axis). Instead of showing a scatter plot, the averaged measurement error (i.e., the bias <x sys ) in bins of true delay is shown by the 
shaded rods, while the error bars represent the standard deviation cr ran . The bin intervals are indicated by the light vertical lines. 
For increased clarity, the results from the three techniques are drawn side by side within each bin. To give an example, we can 
observe on this figure that the dispersion-like technique systematically underestimates the delay A?ab by about 1.5 days, no matter 
the true delay used in the synthetic curves. For this specific curve shifting technique and pair of curves, the bias is larger than the 
standard deviation of the measurement errors, hence underlining the importance of evaluating the bias. Recall that the Monte Carlo 
procedure described in Section [7] allows us to perform the analysis summarized in this figure for any set of observed light curves. 
The knowledge of the true delays, as available for the artificial curves examined in this section, is not used at any step here. 
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erage knot step of 20 days for the intrinsic spline, and of 150 
days for the four extrinsic splines. The dispersion-like method 
is allowed to correct for extrinsic variability using independent 
linear trends on each of the 16 seasons. All other parameters of 
the curve shifting techniques are fixed to the default values pre- 
sented earlier. 

As described in Section [3] we systematically run the curve 
shifting methods several hundreds of times on the same light 
curves, starting from randomized initial time shifts, and we use 
the mean of the resulting time delay distributions as our best es- 
timation. This leads to the centroids of the time delay estimates, 
presented in Fig. [9] The dashed vertical lines indicate the true 
time delays of the trial curves analysed in this section. 

Finally, we compute error bars for these time delay measure- 
ments following the procedure of Section [7] In doing so, the 
curves used for the Monte Carlo runs are drawn from a new gen- 
erative model built from scratch from the trial curves. Note that 
despite this precaution, the same kind of recipes were used to 
build the trial curves and the synthetic curves used in the Monte 
Carlo procedure. The analysis performed in this section should 
therefore be seen as a in a self-consistency check of our time 
delay uncertainty estimation procedure. 



We present this error analysis in Fig. 10 For each bin of true 



delay used in the Monte Carlo simulations, we show the mean 
cr sys and standard deviation <x ran of the measurement errors as 
a shaded rod and as an associated error bar, respectively. From 
this figure, we directly obtain the total uncertainty cr tot following 
Eq. 12 for each technique and pair of quasar images. These cr tot 
are used as error bars in Fig. [9] 



8.3. Discussion 

Several important observations can be made from Figs.|9|&[T0| 
We recall that the analysis leading to these figures did not use 
any knowledge of the true time delays between the trial curves 
at any time, except for plotting the dashed vertical lines in Fig. 

121 

1. We observe in Fig. [9] that, on average, the total lcr error 
bars computed by our Monte Carlo approach compare well 
with the actual errors made by our point estimators on the 
trial curves. In particular, this holds for the dispersion-like 
technique, that suffers in this example from the largest bi- 
ases. The self-consistency check of our uncertainty estima- 
tion procedure is successful. 

2. Fig. [10] clearly shows that, in the case of these trial curves, 
our estimates of both the bias and the variance of each curve 
shifting method does not depend much on the true time de- 
lays of the Monte Carlo simulations. This optimal situation 
is often not observed for shorter or lower quality curves, mo- 
tivating our conservative decision to combine the maximum 
bias and variance to get the total error bar following equation 

3. We can make an additional observation about the direction 
and magnitude of the bias. Lets consider for example the 
time delay Af BD . From Fig.|9]we see that the lowest estimate 
of the delay is obtained by the regression difference tech- 
nique (green), followed by the spline technique (blue) and 
then by the dispersion-like technique (red). Independently, 



Fig. 10 shows that on our Monte Carlo simulations, the re- 



mated from the Monte Carlo simulations is consistent with 
the sequence of time delays measured on the trial curves. A 
similar statement holds for nearly all pairs of quasar images, 
which is remarkable. We conclude that our uncertainty esti- 
mation procedure of Section [7] is at least in part successful 
in separating the bias due to flaws of the methods from the 
random error that genuinely originates from the data. Note 
that this aspect can be analysed for real data, as it does not 
involve the knowledge of the true time delays between the 
trial curves. 



8.4. Investigating error correlations between pairs of quasar 
images 

The error bars computed for a given delay with our a method 
marginalize over the other delays of the same lens. However, 
the delay measurements are not independent in their very na- 
ture. For example, At^c = At^B + At^c, hence any estimation 
of At ac correlates positively with both A?ab and AfBC- The for- 
malism presented in this paper does not fully exploit this joint 
information. 

We can nevertheless use the measurements obtained from the 
Monte Carlo simulations to explore correlations between delay 
estimation errors of different pairs of quasar images. In Fig. 1 1 
we show this correlation for pairs of curves, marginalizing over 
the true delays used in the Monte Carlo simulations. 

Displacements of these distributions with respect to the zero- 
error crosshairs indicate bias, while the width of the distributions 
indicates the variance of the time delay estimators. As expected, 
we observe positive and negative correlations, that is ellipsoids 
with oblique orientations, for pairs of delays that involve a com- 
mon quasar image. Very importantly, at least for the specific set 
of light curves used in this work, we do not observe correlations 
between the 3 "disjoint" pairs of delays AB/CD, AC/BD, and 



AD/BC, located along the short diagonal of Fig. 1 1 



gression difference technique tends to underestimate this de- 
lay, the spline technique tends to slightly overestimates it, 
while the dispersion-like technique overestimates it on aver- 
age by ~ 1.5 days. Hence the "order" of the biases as esti- 



This analysis is to be made for the light curves of every spe- 
cific quadruply imaged quasar, as the sensitivity of the curve 
shifting algorithms certainly depends on the details of the fea- 
tures in real quasar light curves. If the results are qualitatively 
similar to those shown in Fig. 11 we deem as appropriate to 
present our time delay measurements in the form of indepen- 
dent estimates, marginalizing over all correlations. Note that in 
practice a joint probability distribution is only really interesting 
for quadruply imaged quasars in which several delays are well 
measurable. 



8.5. Getting a single answer 

It remains to be decided how to "combine" the delay measure- 
ments obtained from our different techniques. We cannot com- 
bine the measurements as if they were independent, for instance 
by multiplying associated probability density functions. Indeed, 
no matter how different the methods are, they all use the one 
and only realization of the observations. In particular, it can well 
happen that the time delay error bars, as derived by our approach 
for each method, are significantly larger than the apparent spread 
of the delay point estimates obtained from the different methods 
(see e.g. panel BC of Fig. [9j. The spread of different estimates 
is clearly not an indication of the degree of knowledge of a time 
delay. 

Hence, when analysing real observations, we propose to sim- 
ply select, individually for each lens, the method that tends to 
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Fig. 11. Correlations of delay measurement errors for the quad 
lens analysed in Section [8] The distribution of delay measure- 
ment errors on 1000 synthetic curves is shown for each quasar 
image pair against each other pair, marginalizing over the true 
delays of the synthetic curves. The central crosshair of each 
panel indicates zero error and the ticks are days, i.e., each axis 
shows errors from -5 to +5 days. Displacements of the distri- 
butions with respect to the crosshairs indicates bias, while the 
width of the distributions indicates variance of the time delay 
estimators. For clarity, only single contours at half of the maxi- 
mum density are shown for two of the techniques. The measure- 
ment errors shown in this figure are exactly the same as used in 



Fig. 10 



display the smallest total error bar. One can directly use its point 
estimates and the associated lcr error bars as favored answer. 

Average delays between competitive methods could also be 
used, but in any case the size of the uncertainties are not to be 
divided by the square root of the number of methods contributing 
to these averages. 



data points. This method has no explicit model for the com- 
mon intrinsic variability of the quasar, but it involves poly- 
nomial models for the extrinsic variability. It has previously 
been applied in |Courbin"e t al. (20lT]l. 

A common point of the methods is that they yield point estimates 
(i.e., single values) for the time delays in a self-consistent way 
by sharing the formalism of time shifts described in Section [3] 

In addition, we present a Monte Carlo approach to estimate 
a comprehensive error bar for each time delay measurement. 
This procedure is based on synthetic curves that try to mimic 
as much information about the intrinsic and extrinsic variabil- 
ity as the observations unmistakenly reveal. Provided that we 
accept the generative model of the synthetic curves, the curve 
shifting techniques themselves are reduced to "recipes". Given 
a set of light curves, we can select methods based solely upon 
their empirical performance. This effectively shifts the require- 
ment of formal justification from the curve shifting techniques 
to the synthetic curves on which these techniques are evaluated. 
As a consequence, the recipes can even be fine tuned for each 
data set. 

Finally, we have verified the self-consistency of our time 
delay uncertainty estimation using a trial set of artificial light 
curves. Importantly, the availability of 3 different curve shifting 
techniques allows to perform consistency checks of our bias de- 
termination when analysing real observations, i.e., data without 
known time delays. In other words, we acknowledge that any 
curve shifting technique may display residual biases, due for ex- 
ample to particular patterns of slow microlensing variability, but 
we provide a mean to credibly quantify this bias, whatever be its 
origin. 

The methods described in this paper will be used as stan- 
dard benchmark to obtain time delay estimations from the light 
curves of the COSMOGRAIL monitoring program. We have im- 
plemented the curve shifting techniques, the error bar estimation, 
and the generation of all the figures of this paper in form of a 
modular python toolbox. This package, called PyCS, as well 
as a tutorial including the trial curves of Section [8] are available 
from the COSMOGRAIL websittfD 
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9. Conclusions 

In this paper, we describe 3 independent "curve shifting tech- 
niques" to measure time delays between resolved light curves 
of gravitationally lensed quasars. All these methods address the 
presence of variable microlensing in the light curves and can be 
applied to lens systems with any number of images. 

1. The free knot spline technique simultaneously fits one 
common intrinsic spline, and independent smoother extrin- 
sic splines to the light curves. The curves are shifted in time 
so to optimize this fit. 

2. The regression difference technique shifts regressions of 
the curves so to minimize the variability of the differences 
between them. It is nearly parameter free, and does not re- 
quire an explicit model for the microlensing variability. 

3. The dispersion-like technique shifts the curves so to min- 
imize a measure of the dispersion between the overlapping 
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